Unlike tabular data, features in network data are interconnected within a domain-specific graph. Examples of this setting include gene expression overlaid on a protein interaction network (PPI) and user opinions in a social network. Network data is typically high-dimensional (large number of nodes) and often contains outlier snapshot instances and noise. In addition, it is often non-trivial and time-consuming to annotate instances with global labels (e.g., disease or normal). How can we jointly select discriminative subnetworks and representative instances for network data without supervision? We address these challenges within an unsupervised framework for joint subnetwork and instance selection in network data, called UISS, via a convex self-representation objective. Given an unlabeled network dataset, UISS identifies representative instances while ignoring outliers. It outperforms state-of-the-art baselines on both discriminative subnetwork selection and representative instance selection, achieving up to 10% accuracy improvement on all real-world data sets we use for evaluation. When employed for exploratory analysis in RNA-seq network samples from multiple studies it produces interpretable and informative summaries.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在这项工作中,我们提出了一种从随机已知的日志中恢复的算法,这种设置越来越普遍,随着传感器数量的增加和产生不确定数据的预测模型。建议的方法计算过程模型与随机已知的痕迹之间的符合性,并在此随机迹线中恢复最佳对齐,作为真实痕迹。本文提供了各种成本模型对痕量恢复准确性的影响的分析,并利用产品多刷子来比较替代的跟踪恢复选项。我们使用两个公开数据集进行评估的方法的平均准确性令人印象深刻,平均恢复精度得分为90-97%,显着改善了一种共同的启发式,它为每种不确定的活动选择了最可能的价值。我们认为,拟议算法从随机已知的日志中恢复正确的痕迹的有效性可能是在不确定的环境中开发可靠的决策工具的有力帮助。
translated by 谷歌翻译
人类语言学习者暴露于信息丰富的上下文敏感语言,但要大量的原始感觉数据。通过社会语言的使用和彩排和实践的内部过程,语言学习者能够建立高级的语义表示,以解释他们的看法。在这里,我们从人类中的“内在语音”过程中汲取灵感(Vygotsky,1934年),以更好地理解代理内语言在体现行为中的作用。首先,我们正式地将代理语音作为半监督问题,并开发了两种算法,这些算法能够以很少的标记语言数据进行视觉接地字幕。然后,我们通过实验计算不同量的标记数据的缩放曲线,并将数据效率与监督学习基线进行比较。最后,我们将演讲内部的语音纳入3D虚拟世界中运行的体现的移动操纵剂代理,并表明,只需多达150个附加图像标题,代理语音就可以操纵和回答有关的问题。一个没有任何相关任务经验的新对象(零射)。综上所述,我们的实验表明,对代理内部的语音进行建模有效,可以使体现的代理有效地学习新任务,而无需直接互动经验。
translated by 谷歌翻译
创建可以自然与人类互动的代理是人工智能(AI)研究中的共同目标。但是,评估这些互动是具有挑战性的:收集在线人类代理相互作用缓慢而昂贵,但更快的代理指标通常与交互式评估相关。在本文中,我们评估了这些现有评估指标的优点,并提出了一种新颖的评估方法,称为标准化测试套件(STS)。 STS使用从真实人类交互数据中挖掘出的行为方案。代理商请参阅重播方案上下文,接收指令,然后将控制权控制以脱机完成交互。记录这些代理的延续并将其发送给人类注释者以将其标记为成功或失败,并且根据其成功的连续性比例对代理进行排名。最终的ST是自然主义相互作用的快速,控制,可解释的和代表的。总的来说,STS巩固了我们许多标准评估指标中所需的许多值,从而使我们能够加速研究进展,以生产可以自然与人类互动的代理。可以在https://youtu.be/yr1tnggorgq上找到视频。
translated by 谷歌翻译
It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.
translated by 谷歌翻译
来自科幻小说的普通愿景是机器人将有一天居住在我们的物理空间中,感知世界,才能协助我们的物理劳动力,并通过自然语言与我们沟通。在这里,我们研究如何使用虚拟环境的简化设计如何与人类自然交互的人工代理。我们表明,与自我监督学习的模拟世界中的人类交互的模仿学习足以产生我们称之为MIA的多模式互动剂,这成功与非对抗人类互动75%的时间。我们进一步确定了提高性能的架构和算法技术,例如分层动作选择。完全,我们的结果表明,模仿多模态,实时人类行为可以提供具有丰富的行为的富含性的令人生意的和令人惊讶的有效手段,然后可以为特定目的进行微调,从而铺设基础用于培训互动机器人或数字助理的能力。可以在https://youtu.be/zfgrif7my找到MIA的行为的视频
translated by 谷歌翻译